Data Visualization

Cory Whitney

Data visualization: getting stuck

  • Open RStudio

  • type ? in R console with function, package or data name

  • Add R to a search with a copy of an error message

  • Help > Cheatsheets > Data Visualization with ggplot2

Data visualization: getting help

  • Many talented programmers
  • Some scan the web and answer issues

https://stackoverflow.com/

Getting your data in R

Load data

  • Load the data
participants_data <- read.csv("participants_data.csv")
  • Keep your data in the same folder structure as .RProj
  • at or below the level of .RProj

Creating a barplot in base R

R has several systems for making graphs

  • Base R
  • Create a barplot with the table() and barplot() functions
participants_barplot <- table(participants_data$academic_parents)

barplot(participants_barplot)

plot of chunk base_barplot Bar plot of number of observations of binary data related to academic parents

ggplot2: 'Grammar of Graphics' Overview

Many libraries and functions for graphs in R…

  • ggplot2 is one of the most elegant and most versatile.

  • ggplot implements the grammar of graphics to describe and build graphs.

  • Do more and do it faster by learning one system and applying it in many places.

  • Learn more about ggplot2 in “The Layered Grammar of Graphics”

http://vita.had.co.nz/papers/layered-grammar.pdf

ggplot2: names and email

Example from your data

library(ggplot2)
ggplot(data = participants_data, 
       aes(x = letters_in_first_name, 
           y = days_to_email_response)) + 
  geom_point()

plot of chunk ggplot_name_email Scatterplot of days to email response as a function of the letters in your first name

Want to understand how all the pieces fit together? See the R for Data Science book: http://r4ds.had.co.nz/

ggplot2: add color and size

ggplot(data = participants_data, 
       aes(x = letters_in_first_name, 
           y = days_to_email_response, 
           color = academic_parents, 
           size = working_hours_per_day)) + 
  geom_point()

plot of chunk ggplot_color_size Scatterplot of letters in your first name as a function of days to email response with colors representing binary data related to academic parents and working hours per day as bubble sizes.

Make more graphs

ggplot2: iris data

Example from Anderson's iris data set

ggplot(data = iris, 
       aes(x = Sepal.Length, 
           y = Petal.Length, 
           color = Species, 
           size = Petal.Width))+ 
  geom_point()

plot of chunk ggplot_iris Scatterplot of iris petal length as a function of sepal length with colors representing iris species and petal width as bubble sizes.

ggplot2: diamonds color shape

Create a smaller diamonds data set (top 100 rows) for a scatterplot with carat on the x-axis and price on the y-xis and with the color of the diamond as the color of the points.

dsmall <- top_n(diamonds, n = 100)

ggplot(data = dsmall, aes(x = carat, 
                          y = price, 
                          color = color)) + 
  geom_point()

plot of chunk diamonds_color

ggplot2: geom options

With “geom” different types of plots can be defined e.g. points, line, boxplot, path, smooth. These can also be combined.

# Create a smaller data set of diamonds with 50 rows.
dsmall <- top_n(diamonds, 
                n = 50) 
# Create a scatterplot and smoothed conditional 
# means overlay with carat on the x-axis 
# and price on the y-axis.
ggplot(data = dsmall, 
       aes(x = carat, 
           y = price))+
geom_point()+
geom_smooth()

plot of chunk unnamed-chunk-2

ggplot2: smooth function

geom_smooth() selects a smoothing method based on the data. Use method = to specify your preferred smoothing method.

# Create a smaller data set of diamonds with 50 rows. 
dsmall <- top_n(diamonds, 
                n = 50)
# Create a scatterplot and smoothed conditional 
# means overlay with carat on the x-axis 
# and price on the y-axis.
# Use 'glm' as the option for the smoothing
ggplot(data = dsmall, 
       aes(x = carat, 
           y = price))+ 
  geom_point()+ 
geom_smooth(method = 'glm') 

plot of chunk ggplot_smooth ggplot2 lines and smoothing options

ggplot2: boxplots

  • Boxplots can be displayed through geom_boxplot().
# Create a boxplot where the x-axis is cut and
#  the y-axis is price divided by carat
ggplot(data = diamonds, 
       aes(x = cut, 
           y = price/carat)) + 
geom_boxplot()

plot of chunk ggplot_boxplot

ggplot2: jitter points

  • Jittered plots geom_jitter() show all points.
# Create a jittered boxplot where the x-axis is cut and
#  the y-axis is price divided by carat
ggplot(data = diamonds, 
       aes(x = color, 
           y = price/carat)) + 
geom_boxplot()+ 
geom_jitter()

plot of chunk jitter_plot

Your turn to perform

After you have gone through the tutorial please do the following exercises.

  • Create a scatter plot, barchart and boxplot (as above)
  • Vary the sample and run the same analysis and plots
  • Save your most interesting figure and share it with us